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(54) MULTIMODAL DEVICE 

(57)Abstract: 

PURPOSE: To easily and surely perform 
the input operation by a user and to 
improve operability by linking a voice input 
means with a touch input means. 
CONSTITUTION: When the approach of 
the user is detected by an approach 
detection part 1 7, a control part 1 3 
classifies and displays a lot of input items 
on a display part 12 and receives the input 
of all items by sound from a sound input 
part 1 1 , however, receives only the 
representative items of classification by 
touch input from a touch input part 1 5. 
When one of the representative items is 

touch inputted, the control part 13 enlarges and displays all the items 
belonging to the classification on a screen and receives item specifying 
input from the touch input part 1 5. 
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CLAIMS 
[Claim(s)] 

[Claim 1] The voice input means used for the input of the item with the 
voice from a user which should be chosen etc., A speech recognition 
means to recognize the voice inputted by this voice input means, The 
display means used for the display of the menu which consists of 
various selections, various messages, etc., The touch input means for 
carrying out the assignment input of the item concerned by carrying out 
the location directions of the item on the screen currently displayed on 
this display means with a user's finger etc., Said voice input means, said 
speech recognition means, said display means, and said touch input 
means are controlled. The control means which receives the input from 
said voice input means and said touch input means is provided. Said 
control means When the predetermined menu which becomes said 
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display means from said selections is displayed Multi-modal equipment 
characterized by limiting the item in which an assignment input is 
possible with said touch input means in the menu concerned, and setting 
up the number of the items in which an assignment input is possible with 
said touch input means fewer than the item which can be inputted with 
said voice input means. 

[Claim 2] It is multi-modal equipment according to claim 1 characterized 
by classifying and displaying the item in said predetermined menu, and 
for either said voice input means and said touch input means treating 
said control means as an input being possible about the item 
representing said each classification, and treating as an input being 
possible only from said voice input means about the item belonging to 
said each classification. 

[Claim 3] It is multi-modal equipment according to claim 2 which that 
from which the item belonging to said classification is the item in which 
it represents the classification concerned only with one is contained in 
said predetermined menu, and is characterized by treating said control 
means as an input being possible also from said touch input means 
about the representation item concerned. 

[Claim 4] When the item representing said classification in said 
predetermined menu is inputted from said voice input means or said 
touch input means, said control means The menu of a lower layer with 
which the item belonging to the classification concerned was set as 
larger size than the same item in said predetermined menu is displayed 
on said display means. Multi-modal equipment according to claim 3 
characterized by treating the item in the menu of this displayed lower 
layer as an input being possible from said touch input means. 
[Claim 5] The voice input means used for the input of the item with the 
voice from a user which should be chosen etc., A speech recognition 
means to recognize the voice inputted by this voice input means, The 
display means used for the display of the menu which consists of 
various selections, various messages, etc., The touch input means for 
carrying out the assignment input of the item concerned by carrying out 
the location directions of the item on the screen currently displayed on 
this display means with a user's finger etc., Said voice input means, said 
speech recognition means, said display means, and said touch input 
means are controlled. The control means which receives the input from 
said voice input means and said touch input means is provided. Said 
control means Only a part is displayed without displaying the item which 
can be inputted with said voice input means on said display means. When 
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the item input from said voice input means was received and rejection 
with said speech recognition means continues generating or the number 
of predetermined times to input voice, Or multi-modal equipment 
characterized by displaying all the items set as the object of the 
assignment input by said touch input means on said display means, and 
switching to registration of the item input from said touch input means 
when the correction demand to a recognition result is inputted. 
[Claim 6] The voice input means used for the input of the item with the 
voice from a user which should be chosen etc., A speech recognition 
means to recognize the voice inputted by this voice input means, The 
display means used for the display of the menu which consists of 
various selections, various messages, etc., The touch input means for 
carrying out the assignment input of the item concerned by carrying out 
the location directions of the item on the screen currently displayed on 
this display means with a user's finger etc., Said voice input means, said 
speech recognition means, said display means, and said touch input 
means are controlled. The control means which receives the input from 
said voice input means and said touch input means is provided. Said 
control means Only a part is displayed without displaying the item which 
can be inputted with said voice input means on said display means. When 
the item input from said voice input means is received and an input is 
not made in predetermined time Multi-modal equipment characterized by 
displaying all the items set as the object of the assignment input by said 
touch input means on said display means, and switching to registration 
of the item input from said touch input means. 

[Claim 7] The voice input means used for the input of the item with the 
voice from a user which should be chosen etc., A speech recognition 
means to recognize the voice inputted by this voice input means, The 
display means used for the display of the menu which consists of 
various selections, various messages, etc., The touch input means for 
carrying out the assignment input of the item concerned by carrying out 
the location directions of the item on the screen currently displayed on 
this display means with a user's finger etc., Said voice input means, said 
speech recognition means, said display means, and said touch input 
means are controlled. The control means which receives the input from 
said voice input means and said touch input means is provided. Said 
control means Only a part is displayed without displaying the item which 
can be inputted with said voice input means on said display means. When 
the item input from said voice input means was received and rejection 
with said speech recognition means continues generating or the number 
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of predetermined times to input voice, when the correction demand to a 
recognition result is inputted, an input should do in predetermined time - 
- the multi-modal equipment characterized by displaying all the items in 
which said input is possible on said display means, and continuing 
registration of the item input from said voice input means when there is 
nothing. 



[Translation done.] 
* NOTICES * 

JPO and INPIT are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation 
may not reflect the original precisely. 

2. **** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



DETAILED DESCRIPTION 

[Detailed Description of the Invention] 
[0001] 

[Industrial Application] This invention has two or more dialogue means 
with a user, and relates to the multi-modal equipment which makes 
complicated actuation available more simply. 
[0002] 

[Description of the Prior Art] In recent years, various development of 
the multi-modal equipment equipped with two or more input means, such 
as a voice input (speech recognition) means and a touch input means, 
for amplification of the improvement in operability or the degree of 
freedom of actuation is carried out. 

[0003] It is common to carry out a screen display of the list of the items 
set as the object of a selection input (menu) with this kind of multi- 
modal equipment, and to make the item of arbitration input by voice 
input or the touch input from that inside. Here, when there are many 
items, spacing between contiguity items (viewing area) becomes small. In 
such a case, in a touch input, an exact touch becomes impossible under 
the effect of parallax, and the effect of the size of a finger, and it is hard 
to perform the input of the item which the user meant. 



file://C:¥Documents%20and%20Settings¥saeko¥My%20Documents¥JPOEn¥JP-A-... 2007/02/15 



[0004] Then, with conventional multi-modal equipment, when there were 
many items which carry out a screen display, the number of items which 
adds a limit of not receiving the input from a touch input means by 
which it is not suitable for item selection from many items, or carries out 
a screen display at once was lessened, and a method, such as enabling 
both the inputs of voice input and a touch input, was taken about the 
item. 

[0005] However, by the former method, by being, since [ for which the 
item input by the touch input means is not received ] it was deficient in 
the degree of freedom of actuation of a user and a screen display of the 
item was carried out from things, it might perform touch alter operation 
accidentally [ user ], and moreover, the actuation had the problem of 
becoming useless. On the other hand, by the latter method, there was a 
problem that the features of a voice input means by which an item input 
can be performed without being influenced by the target number of 
items could not fully be employed efficiently. 

[0006] Moreover, although it will waver [ what such a user should do 
after that by being easy to generate rejection by speech recognition, and 
a recognition error for some users and ] by voice input, the 
consideration to such a user was not made with conventional multi- 
modal equipment. 
[0007] 

[Problem(s) to be Solved by the Invention] As described above, with 
conventional multi-modal equipment, cooperation of two or more input 
means did not necessarily become the configuration which can be taken 
well, and was not functioning effectively, that is, it was not what results 
in adding a limit of an input function mutually in many cases in order are 
unclear in how may I use any input means for a user and I may operate 
it, or the operating instruction to a user takes time amount or to make 
the same item input with a different input means hard [ slight / in which 
two or more input means exist with conventional multi-modal 
equipment ], and it is not necessarily easy using . 
[0008] This invention was made in consideration of the above- 
mentioned situation, and the object is in offering multi-modal equipment 
with the sufficient user-friendliness which can take cooperation with the 
voice input means and touch input means which are a different input 
means so that the features of each input means may be employed 
efficiently, has it, and can ensure [ easily / a user's alter operation / 
and ] it. 
[0009] 
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[Means for Solving the Problem and its Function] two or more selections 
which displayed the multi-modal equipment of this invention on a display 
means — when a menu [ from ] is a predetermined menu (the specific 
menu defined beforehand), the item in which an assignment input is 
possible limits with said touch input means in the menu concerned, and 
it is characterized by to set up the number of the items in which an 
assignment input is possible with a touch input means fewer than the 
item which can input with a voice-input means. 

[0010] In such a configuration, even if many items are included on the 
predetermined menu, about the large item of other viewing areas, it 
becomes that it is possible to carry out an assignment input correctly 
with a touch input means by treating as outside of the object of the 
assignment input by the touch input means only about a small item of a 
viewing area it becomes impossible under the effect of parallax, and the 
effect of the size of a finger exact to touch input. Moreover, a voice 
input means is not influenced by the item which was made to the touch 
input means and which can be inputted of a limit, therefore can also 
employ efficiently the features of a voice input means by which it is 
suitable for item selection from many items. 

[0011] moreover, about the item which this invention classifies and 
displays the item in the above-mentioned predetermined menu, and 
represents each classification About the item to which either a voice 
input means and a touch input means are treated as an input being 
possible, and they belong to each classification When the item which is 
characterized also by treating as an input being possible only from a 
voice input means, and represents a classification further is inputted 
from a voice input means or a touch input means The item belonging to 
the classification concerned is characterized also by displaying the menu 
of the lower layer set as larger size than the same item in the above- 
mentioned predetermined menu, and treating the item in the menu of 
this displayed lower layer as an input being possible from a touch input 
means. 

[0012] By doing in this way, it also becomes possible about the item 
representing a classification to input the item to which the input by the 
touch input means can also be performed from the start, and the 
representation item belongs to voice input or the classification which 
corresponds if a touch input is carried out from a touch input means. 
[0013] Moreover, this invention is characterized also by treating as an 
input being possible also from a touch input means about this item in the 
above-mentioned predetermined menu including the thing used as the 
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item with which the number of the items belonging to a classification is 
one, and it represents the classification concerned. 
[0014] In such a configuration, it becomes possible to choose the item 
concerned at once also by the touch input by treating an item with the 
high frequency chosen as an only item which belongs to a classification 
on the above-mentioned predetermined menu. 

[0015] Moreover, this invention displays only a part, without displaying 
the item which can be inputted with a voice input means. When the item 
input from a voice input means was received and rejection continues 
generating or the number of predetermined times by speech recognition 
processing to input voice, when the correction demand to a recognition 
result is inputted, an input should do in predetermined time — in 
corresponding to either in case there is nothing, all the items set as the 
object of the assignment input by the touch input means are displayed, 
and it carries out ****** switched to registration of the item input from 
a touch input means as the description. 

[0016] According to such a configuration, for the user in the inclination 
for the speech recognition of the input voice to be hard to be carried 
out correctly, or the user who wavers in audio alter operation and does 
not cause voice input actuation, since it is automatically switched to the 
condition in which a touch input is possible, it becomes possible to 
perform touch alter operation promptly. Moreover, since all the items in 
which the input by the voice input means is possible are not displayed, 
when the user understands beforehand the item (name of the station) 
which should be inputted like [ in case this invention is applied to the 
ticket machine of a station ], it can prevent giving a complicated 
impression to the user compared with the case where all items are 
displayed. 

[0017] Moreover, you may make it continue registration of the item input 
from a voice input means without switching to registration of the item 
input from a touch input means as mentioned above, when all the items 
that can be inputted are displayed. With such a configuration, when the 
user does not grasp certainly the item which should be inputted, it can 
consider as the help of a user's voice input. 
[0018] 
[Example] 

The case where the 1st example of this invention is applied to the multi- 
modal equipment used for the order machine of a hamburger shop is 
explained to an example with reference to a drawing below the [1st 
example]. 
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[D019] Drawing 1 is the block diagram showing the outline configuration 
of the multi-modal equipment concerning this example. The multi-modal 
equipment shown in drawin g 1 consists of the voice input section 11, a 
display 12, a control section 13, the speech recognition section 14, the 
touch input section 15, the voice output section 16, and the access 
detection section 17. 

[0020] The voice input section 1 1 is the hand set mold audio input unit 
with which it is used in order to input the voice which the user uttered, 
and the microphone was built in. A display 12 is used for the display of 
the menu which consists of various selections (input item), various 
messages, etc., and is a CRT display or a liquid crystal display. 
[0021] A control section 13 lets a display 12 and the voice output 
section 1 6 pass, and manages and controls a dialogue with a user. A 
control section 1 3 manages the message output to the voice output 
section 1 6 etc. at receptionist control of the touch input from the touch 
input section 1 5, a switch of the display screen according to the content 
of a reception beam input (current condition), and display 12 list in the 
input list by the display of the initial screen to the display 12 according 
to the detection result of the access detection section 1 7, and the 
speech recognition of the speech recognition section 14. 
[0022] The speech recognition section 14 manages recognition 
processing of the voice inputted from the voice input section 1 1. The 
touch input section 1 5 has the touch panel used for the screen of a 
display 12 in piles. The touch input section 15 detects and inputs the 
touch location using change of electrostatic capacity, infrared electric 
shielding, change of gravity, etc. by what (a finger etc. describes) is 
touched on the touch panel with which the user looked at the display 
screen on a display 12, and put the display position of a desired item on 
the screen. Although recognized by the control section 1 3 based on that 
positional information and configuration information of a screen on 
display, in order for what the item equivalent to the positional 
information inputted by this touch input section 15 is, i.e., what is the 
item chosen by the user (input)?, to simplify explanation here, the 
assignment input of the item shall be carried out by the touch input 
section 15. 

[0023] The voice output section 1 6 is used for outputs, such as a 
message with voice, and is a loudspeaker. The access detection section 
17 detects that the user approached this equipment, and is an optical 
sensor. 

[0024] Next, an example of the utilization procedure of the multi-modal 
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1 

equipment of the configuration of drawing 1 is explained with reference 
to drawing 2 and the flow chart of drawing 3 and drawing 4 thru/ or the 
example of the display screen of drawing 6 . First, if a user approaches 
this equipment (step 201), that will be detected by the access detection 
section 17 (step 202), and will be notified to a control section 13. 
[0025] Then, a control section 13 displays the display screen for an 
order (initial name-of-article table) as shown in drawing 4 on a display 
1 2 (step 203). The display screen shown in this drawing 4 has the screen 
structure for dividing and displaying many order items on six 
classifications shown in signs 401-406. In the example of a display 
screen of drawing 4 , a "hamburger", "juice", a "snack", a "cool drink' , 
"cutlet curry", and a "hot drink" are used as an item (provisions of 
classification) representing classifications 401-406. The field «order> 
column) for displaying the user's order content (receiving recognition 
result) is secured to the screen lower part. In drawing, it is 
"cheeseburger as an order content. Although 1 piece" is shown, 
actually, a user places an order, and when the content has been 
recognized, it is displayed. Moreover, the "check" carbon button 407 for 
directing termination of an order by the touch input is displayed on the 
right end of this field. In addition, it is possible to attach the picture 
(mark) showing the description of goods [ / other than the character 
string which shows a name of article (item) ], and to make the content 
of the goods intelligible. 

[0026] In the display screen shown in drawing 4 , a user can also touch 
and place [ also placing an order in an item input with voice using the 
voice input section 1 1, and ] an order for the location (it is only 
hereafter called the field of an item) of the touch input section 1 5 (on a 
touch panel) equivalent to the viewing area of the item of the request in 
a screen with a finger (touching). 

[0027] However, if it is an item input with voice, although the item (name 
of article) belonging to classifications 401-406 can be specified by one 
utterance, when the touch input section 1 5 is used, only provisions of 
classification can carry out an assignment input at first, but a desired 
item (name of article) can be specified no longer in one touch input. 
[0028] Thus, if the reason for the ability not to specify the target item in 
one touch input has many items like the example of drawing 4 when the 
display of the usual small screen is used as a display 12, it will be 
because the size of an item (viewing area) and spacing between 
contiguity items (viewing area) become small and the exact touch of 
them becomes impossible under the effect of parallax, and the effect of 
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the size of a finger, for this reason — the case where provisions of 
classification are touched in this example — ( — the case of special 
provisions of classification which are mentioned later — removing — ) — 
a screen display only of that classification is carried out greatly, and it 
can be made to carry out the touch input of the item (name of article) 
belonging to that classification. 

[0029] a user may utter with a "cheeseburger" and "one piece" to the 
microphone of built-in in the voice input section 1 1, and may place an 
order (step 204), and, specifically, the field (field of classification 401) of 
the "hamburger" which is the provisions of classification which 
represent a "cheeseburger" etc. on the touch input section 1 5 is 
touched (step 209) — it is good even if like. 

[0030] The voice which the user uttered is inputted by the voice input 
section 11. The speech recognition section 14 performs recognition 
processing of the voice inputted by the voice input section 1 1 , and 
passes the recognition result to a control section 1 3. 
[0031] On the other hand, the positional information of the item which 
the user touched is inputted by the touch input section 1 5, and is 
passed to a control section 1 3. A control section 1 3 recognizes the 
input assignment (selection) item [ user ] based on this positional 
information and the configuration information of a screen on display. 
[0032] Now, in the above-mentioned example, by the order with voice, if 
it is correctly recognized in the speech recognition section 14, a 
"cheeseburger" can be specified by one utterance. On the other hand, 
at the order by the touch input, it cannot specify by one touch by the 
above mentioned reason. Then, a control section 1 3 displays the display 
screen expanded so that a touch input of the item belonging to a 
classification of a "hamburger" as shown in dra win g 5 might be possible 
on a display 12, when touched in the field of the classification 401 of a 
"hamburger" like the above-mentioned example (step 210). 
[0033] The display screen shown in this drawing 5 has the screen 
structure for displaying a classification of a "hamburger" greatly, as 
shown in a sign 501 . Moreover, the number input area 502 for specifying 
the number of ordered goods by the touch input is secured in the screen 
concerned. "Cancellation" carbon button 503 for directing cancellation 
of an order is formed in this number input area 502. Moreover, the 
viewing area of the user's order content is secured to the screen lower 
part. In drawing, it is "cheeseburger as an order content. Although 1 
piece" is shown, actually, a user places an order, and when the content 
has been recognized, it is displayed. Moreover, the "additional" carbon 
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button 504 for directing the addition of an order by the touch input and 
the "check" carbon button 505 for directing termination of an order by 
the touch input are displayed on the right end of this field. 
[0034] A user can perform an order from (step 21 1) and the touch input 
section 1 5 by touching, and touching the item of the request under 
classification 501 of a "hamburger" (name of article), for example, a 
"cheeseburger", with "1" and an "individual" on the number input area 
502 continuously, if the display screen shown in dr awin g 5 is displayed. 
[0035] A control section 1 3 recognizes the elegance and the number 
which the user ordered from this touch input result. Here, a user 
touches the "additional" carbon button 504 first to order juice 
additionally (step S301). Then, a control section 13 displays the display 
screen again shown in drawing 4 (step 302). 

[0036] Next, if the field (field of classification 402) of the "juice" which 
is the provisions of classification in which a user represents "orange 
juice" etc. with this condition is touched (step 303), a control section 13 
will display the display screen (the same structure as drawing 5 ) 
expanded so that a touch input of the item belonging to a classification 
of "juice" might be possible on a display 1 2 (step 304). 
[0037] When a user does the assignment input of the name of article 
and the number of hope by touching each field of "orange juice" "1" 
"individual" here (step 305), a control section 13 is "orange juice. It 
recognizes that 1 piece" was ordered additionally. 
[0038] Moreover, "orange juice A user "touches [ to cancel the 
"cheeseburger" ordered previously, where 1 piece" is added ]" the 
"additional" carbon button on the display screen of a classification of 
"juice" first like the case (step 301) where the "additional" carbon 
button 504 on the screen of d [rawing 5 is touched (step 306). Then, 
regeneration of the display screen of drawjng__4 is carried out by the 
control section 1 3 (step 307). 

[0039] If a user touches the field of the classification 401 of a 
"hamburger" here (step 308), regeneration of the display screen of 
drawing 5 will be carried out (step 309). When a user touches with 
"cheeseburger" "cancellation" in this condition (step 310), a control 
section 1 3 is "cheeseburger received previously. The order of 1 piece" 
is canceled. 

[0040] And if it judges that a user finished placing an order, the "check" 
carbon button 505 will be touched (step 311). Thereby, a control section 
1 3 ends the order registration from a user. 

[0041] in addition, the order content of the <order> column in dra wi ng 5 
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-- " — cheeseburger one-piece — as one input item — it can treat — 
making — this — " — it may be made to cancel a "cheeseburger" by 
touching the field of cheeseburger one-piece", and "cancellation" 
carbon button 503 continuously. [ finishing / an order ] In this case, the 
operating procedure of cancellation by the user is simplified. 
[0042] on the other hand, if it is an additional order of the "orange 
juice" in the case of voice input, a user (differing from the additional 
order by the touch input) will only utter with "orange juice" and "one 
piece" — being sufficient (step 205) . moreover, when canceling the 
. "cheeseburger" ordered previously, it utters with a "cheeseburger" and 
"cancellation" — being sufficient (step 206) . Termination directions of 
an order here can be performed even if it utters with "a check" (step 
207), and it touches a "check" carbon button (step 208). 
[0043] In addition, like "cutlet curry", although many order items are 
divided and displayed on six classifications 401-406 in the example of 
drawing 4 , when one name of article (item) is made into provisions of 
classification (item representing a classification) as it was, unlike the 
provisions of classification of other "hamburgers" etc., it becomes 
possible to specify it as one only by touching the provisions of 
classification "cutlet curry" (field). 

[0044] Moreover, in the display condition of the screen shown in dra wing 
4 , when uttered, any one, for example, "juice", of provisions of 
classification, only the item belonging to a classification of juice can be 
greatly displayed, as shown in dr awing 6 . It is in the display condition of 
the screen shown in this dra w ing 6 , for example, is "orange juice. In 
order to place an order for 1 piece", it may utter with "orange juice" 
and "one piece", or you may touch with "orange juice", "1", and an 
"individual." 

The case where the [2nd example], next the 2nd example of this 
invention are applied to the multi-modal equipment used for the ticket 
machine of a station is explained to an example with reference to the 
flow chart of drawing 7 , draw ing 8 , or the example of the display screen 
of drawing 12 . Here, since the outline configuration of this equipment is 
the same as that of the equipment concerning said 1st example shown 
in drawing 1 , it also uses drawing 1 together for convenience in 
explanation of the 2nd example. 

[0045] First, if a user approaches this equipment (step 701) and that is 
detected by the access detection section 1 7 (step 702), the display 
screen (initial screen format) as shown in a display 1 2 by the control 
section 1 3 at dr aw ing 8 will be displayed (step 703). Each name of the 
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station is not expressed as the screen shown in this dra win g 8 . It is 
because the user knows the name of the station of the station to which 
he generally wants to go unlike the order machine of a hamburger shop 
which explained the reason in the 1st example, and is because there is 
no need for a display. Moreover, in having displayed each name of the 
station on reverse, it is for giving an impression complicated to a user on 
the contrary in many cases. 

[0046] if the initial screen format which a user shows to draw ing 5 is 
displayed — the advice "to which is it?" — following — the microphone 
of built-in in the voice input section 1 1 — for example, "Asakusabashi" 
and "one sheet" (the case where a user is a child — "Asakusabashi", a 
"child", and "one sheet") — as — it utters (step 704). 
[0047] The voice which the user uttered is inputted by the voice input 
section 11. The speech recognition section 14 performs recognition 
processing of the voice "Asakusabashi" inputted by the voice input 
section 1 1 and a "one sheet" (step 705), and passes the recognition 
result to a control section 13. Then, a control section 13 is a 
"Asakusabashi adult, in response to the fact that the speech recognition 
result, for example, as shown in drawing. 9 . A screen display of the 
recognition result message containing 1 sheet" is carried out (step 706). 

[0048] If a user is able to judge that the destination (and number of 
sheets) has been recognized correctly with reference to the screen 
shown in drawing 9 , he will touch the "check" carbon button 902 
prepared all over the screen (step 707). Then, a control section 1 3 
calculates a tariff and displays the message which demands payment of 
the tariff from a user. 

[0049] On the other hand, when the right recognition result is not 
displayed, a user touches "correction" carbon button 901 prepared all 
over the screen of drawing 9 (step 708), Then, a control section 1 3 
carries out regeneration of the initial screen format shown in drawing 8 
(step 709). However, the message part of "welcome" is erased. 
[0050] Here, when a right recognition result as shown in dra wing 9 is not 
displayed whether a user utters with "Asakusabashi" and "one sheet" 
again or performs voice input again namely, a user touches "correction 
carbon button 901 again (step 710). 

[0051] By the present user, if the count of correction is counted and 
the touch input of the 2 times "correction" is carried out like the count 
of predetermined, for example, this example, even if a control section 1 3 
repeats an item input (here input of a destination, number of sheets, 
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etc.) with voice more than this, it will judge it as what cannot be 
recognized correctly, will be switched to a display screen as shown in 
drawing 10 , and will be switched to registration of the item input in a 
touch (step 71 1). The display screen of this drawing 10 has screen 
structure in which the touch input of a name of the station is possible in 
the initial of a name of the station. 

[0052] In addition, even if a user does count voice input of 
predetermined, when rejecting in the speech recognition section 14, or 
also when one is not decided on (namely, when a speech recognition 
result cannot be taken out), the same treatment as the case where the 
above-mentioned count of "correction" turns into a count of 
predetermined (2 times) is carried out, and you may make it switch to a 
touch input. 

[0053] Moreover, after displaying the initial screen format of drawing 8 , 
even if it goes through a certain fixed time amount, the same treatment 
as the case where the above-mentioned count of "correction" turns 
into a count of predetermined when there is no voice input is carried out 
(by reason of the user not remembering the name of the station of a 
destination clearly, or the method of voice input not being known), and 
you may make it switch to a touch input. Of course, after displaying an 
initial screen format again and calling a user's attention, you may make it 
switch to a touch input once. 

[0054] Now, a user will do the touch input of the initial "**" of the name 
of the station "Asakusabashi" to wish, if the screen shown in drawing 10 
is displayed (step 712). Then, the list for a touch input to which an initial 
as shows a control section 13 to drawing 1 1 makes the name of the 
station of an input item is displayed (step 713). [0055] Then, a user 
does the touch input of "Asakusabashi" (step 714). The number input 
area 502 in drawing 5 and the same number-of-sheets input area (it 
omits by a diagram) are secured in the list screen of drawing 1 1 , and by 
this field up, the number of sheets of a ticket and the touch input which 
is "child (or an adult/child) further are possible, and it has come. 
[0056] The touch input result by the user is displayed on the same 
screen as drawing 9 , and if it is as a result of [ as desired ] a user, a 
user will touch a "check" carbon button. The count result of the count 
of "correction" is cleared by the touch of a this "check" carbon button. 
Moreover, the count result of the count of this "correction" may be 
made to be cleared by switch to the above-mentioned touch input. 
[0057] In addition, a screen display (it displays at once, and in not going 
out, a screen is switched by page [ degree ] assignment and it displays) 
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of the list of all names of the stations as shown in dr aw in g 12 is carried 
out, and it may be made to consider instead of switching to a touch 
input as mentioned above as the help which a user inputs with voice. 
This method is suitable when the user does not remember the name of 
the station of a destination clearly. 
[0058] 

[Effect of the Invention] As explained in full detail above, according to 
this invention, it is suitable for choosing a desired item out of many 
items, and inputting, but Although it is not suitable for the selection 
input of the item out of a voice input means by which the content of an 
input is hard to be recognized correctly for some users, and many items 
Since cooperation of both [ these ] the input means can be taken so 
that the features of the touch input means which can be inputted simply 
and certainly may be employed efficiently, without being based on a user 
in case it chooses from a small number of items, a user's alter operation 
can carry out easily and certainly, and can improve user-friendliness. 
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DESCRIPTION OF DRAWINGS 
[Brief Description of the Drawings] 

[Drawing 1] The block diagram showing the outline configuration of the 
multi-modal equipment concerning the 1st example of this invention. 
[Drawing 2] Drawing showing a part of flow chart for explaining an 
example of the utilization procedure in this example. 
[Drawing 3] Drawing showing the remainder of the flow chart for 
explaining an example of the utilization procedure in this example. 
[Drawing 4] Drawing showing the example of a screen display at the time 
of classifying and displaying many items in this example. 
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[Drawing 5] Drawing showing the example of the display screen at the 
time of displaying greatly the item belonging to one classification chosen 
by the touch input in this example. 

[Drawing 6] Drawing showing the example of the display screen at the 
time of displaying greatly the item belonging to one classification chosen 
by voice input in this example. 

[Dr awi ng 7] The flow chart for explaining an example of the utilization 
procedure in the 2nd example of this invention. 

[Drawing 8] Drawing showing an example of the initial screen format in 
this example. 

[Drawing 9] Drawing showing an example of the recognition result display 
screen in this example. 

[Draw ing 10] Drawing showing an example of the screen for a touch 
input in this example. 

[Drawing 11] Drawing showing an example of the screen for a touch 
input which makes the lower layer of the screen of gVaw[ng 1 0 . 
[Drawing 12] Drawing showing an example of the screen for voice input 
in this example. 
[Description of Notations] 

11 [ — The speech recognition section, 15 / — The touch input section, 
16 / — The voice output section, 17 / — Access detection section. ] - 
- The voice input section, 12 — A display, 13 — A control section, 14 
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DRAWINGS 
[Drawing 1] 



file://C:¥Documents%20and%20Settings¥saeko¥My%20Documents¥JPOEn¥JP-A-... 2007/02/15 




*2*\ 



I 3 



1 7 



"Drawing 8 ] 




[D ra wing 2] 



2 0 4- 
2 0 5> 
2 0 6- 



— 2 0 I 



~2 0 2 



^203 



ft* ( l>-XK-rfj 




) 




xj ri 


flj ) 


**** ( i>-X/<-*j 




) 



— 1 5 



2 0 8 



' 1 — 



[ 



— 2 1 0 



| 9**xji x+-x*-ni r i j raj \ ~i i 
(a) (BQ3) 



[D raw ing 3] 

*»^Aa rainj I — 3 o i 
3 



30 2 
3 0 3 



K5 —30 4 



— 3 0 5 



1 ~ 



X 



9**\Ji (r/>>/t-*fj) 
F 



3 06 
]— 3 0T 
-3 0 8 



1 J 



3 0 9 



^3 i o 



rtfj 9*f-Xfi [ —3 



1 1 



file://C:¥Documents%20and%20Settings¥saeko¥My%20Documents¥JPOEn¥JP-A-... 2007/02/15 



[Drawing 4] 
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[Drawing 6] 
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Drawing 11] 
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.Drawing 12] 
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[Drawing ll 
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